68 research outputs found

    Entropic Optimal Transport in Machine Learning: applications to distributional regression, barycentric estimation and probability matching

    Get PDF
    Regularised optimal transport theory has been gaining increasing interest in machine learning as a versatile tool to handle and compare probability measures. Entropy-based regularisations, known as Sinkhorn divergences, have proved successful in a wide range of applications: as a metric for clustering and barycenters estimation, as a tool to transfer information in domain adaptation, and as a fitting loss for generative models, to name a few. Given this success, it is crucial to investigate the statistical and optimization properties of such models. These aspects are instrumental to design new and principled paradigms that contribute to further advance the field. Nonetheless, questions on asymptotic guarantees of the estimators based on Entropic Optimal Transport have received less attention. In this thesis we target such questions, focusing on three major settings where Entropic Optimal Transport has been used: learning histograms in supervised frameworks, barycenter estimation and probability matching. We present the first consistent estimator for learning with Sinkhorn loss in supervised settings, with explicit excess risk bounds. We propose a novel algorithm for Sinkhorn barycenters that handles arbitrary probability distributions with provable global convergence guarantees. Finally, we address generative models with Sinkhorn divergence as loss function: we analyse the role of the latent distribution and the generator from a modelling and statistical perspective. We propose a method that learns the latent distribution and the generator jointly and we characterize the generalization properties of such estimator. Overall, the tools developed in this work contribute to the understanding of the theoretical properties of Entropic Optimal Transport and their versatility in machine learning

    Leveraging Low-Rank Relations Between Surrogate Tasks in Structured Prediction

    Get PDF
    We study the interplay between surrogate methods for structured prediction and techniques from multitask learning designed to leverage relationships between surrogate outputs. We propose an efficient algorithm based on trace norm regularization which, differently from previous methods, does not require explicit knowledge of the coding/decoding functions of the surrogate framework. As a result, our algorithm can be applied to the broad class of problems in which the surrogate space is large or even infinite dimensional. We study excess risk bounds for trace norm regularized structured prediction, implying the consistency and learning rates for our estimator. We also identify relevant regimes in which our approach can enjoy better generalization performance than previous methods. Numerical experiments on ranking problems indicate that enforcing low-rank relations among surrogate outputs may indeed provide a significant advantage in practice.Comment: 42 pages, 1 tabl

    Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

    Full text link
    We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider discrete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.Comment: 46 pages, 8 figure

    Differential Properties of Sinkhorn Approximation for Learning with Wasserstein Distance

    Get PDF
    Applications of optimal transport have recently gained remarkable attention thanks to the computational advantages of entropic regularization. However, in most situations the Sinkhorn approximation of the Wasserstein distance is replaced by a regularized version that is less accurate but easy to differentiate. In this work we characterize the differential properties of the original Sinkhorn distance, proving that it enjoys the same smoothness as its regularized version and we explicitly provide an efficient algorithm to compute its gradient. We show that this result benefits both theory and applications: on one hand, high order smoothness confers statistical guarantees to learning with Wasserstein approximations. On the other hand, the gradient formula allows us to efficiently solve learning and optimization problems in practice. Promising preliminary experiments complement our analysis.Comment: 26 pages, 4 figure

    Web 2.0, Language Learning and Intercultural Competence

    Get PDF
    Whenever a new form of communication appears on the scene, it immediately becomes the object of discussion. This has been going on since the first penny press edition in 1834, whereas today discussions are carried out with reference to the Internet. The stability with which mass-media have faced different criticism can be well understood thanks to the functionalist analysis which considers the media as a social system working within an external system made up of a set of cultural and social conditions. In spite of its complexity, any set of repetitive actions contribute to maintaining or to weakening the stability of the system. We can say that globalization would not have been possible without the media and Web 2.0 may be of remarkable interest for its role in influencing cultural identity. All the past technologies, from electric light to the airplane, took a whole generation to gain ground among people, and Internet has not required such a long time. The impossibility to digest the new modalities of communication offered by the net creates the risk of unexpected contamination. Geographical magazines often show pictures of native Amazonians dressed in their traditional costumes while using computers and mobile phones. Educational uses of Web 2.0 and mobile learning tools have been rapidly expanded over the last few years and a great number of projects have been planned for teaching languages. Mobile learning includes many areas: handheld computers, MP3 players, notebooks and mobile phones. In this paper we shall outline the methodology including selection of web tools, task design, implementation and intercultural communication. The study carried out at the University of Florence shows that learners develop their communication competence while performing entertaining activities which enable them to achieve the desired goals

    Aligning Time Series on Incomparable Spaces

    Get PDF
    Dynamic time warping (DTW) is a useful method for aligning, comparing and combining time series, but it requires them to live in comparable spaces. In this work, we consider a setting in which time series live on different spaces without a sensible ground metric, causing DTW to become ill-defined. To alleviate this, we propose Gromov dynamic time warping (GDTW), a distance between time series on potentially incomparable spaces that avoids the comparability requirement by instead considering intra-relational geometry. We demonstrate its effectiveness at aligning, combining and comparing time series living on incomparable spaces. We further propose a smoothed version of GDTW as a differentiable loss and assess its properties in a variety of settings, including barycentric averaging, generative modeling and imitation learning
    • …
    corecore